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Abstract 

The  Markov  Chain  Chain  Monte  Carlo  (MCMC)  method,  which  is  a  special  case  of 
the  Gibbs  sampler,  is  a  very  powerful  method  to  simulate  from  complicated  distri¬ 
butions  arising  in  many  contexts,  including  image  analysis,  computational  Bayesian 
analysis,  and  so  on.  Existing  results  that  ensure  that  this  method  will  converge 
involve  conditions  which  are  difficult  to  verify  in  practice,  and  most  practitioners, 
convinced  that  their  particular  problem  wiQ  not  be  pathological  and  give  up  verifying 
altogether.  This  paper  gives  a  new  set  of  sufficient  conditions  which  are  easy  to  verify 
in  most  applications. 
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1  The  First  Example 

We  begin  with  a  familiar  example  found  in  the  book  Rao  (1965)  “Linear  Statistical  Inference 
and  its  Applications”  on  data  on  blood  groups  in  human  populations  use  it  to  illustrate 
Markov  Chain  Monte  Carlo  methods.  Every  human  being  can  be  clcissified  into  one  of  four 
blood  groups  0,  A,  B  and  AB.  The  inheritance  of  these  blood  groups  is  controlled  by  three 
allelomorphic  genes  O,  A  and  B,  where  0  is  recessive  to  both  A  and  B.  If  r,p  and  q  are 
the  gene  frequencies  of  O,  A  and  B,  then  the  probabilities  of  of  the  six  genotypes  and  the 
four  phenotypes,  under  random  mating,  and  a  typical  data  on  a  human  population  of  size 
N  can  be  represented  by  the  following  table: 


Group 

Probabilities 

Frequency 

Phenotype  Genotype 

Phenotype  Genotype 

Observed 

Unobserved 

0 

00 

J.2 

n(0) 

^  \ 

\  AA 
[  AO 

p2  -1-  2pr  1 

f 

[  2pr 

n(A) 

f  n(AA) 

^  n(A)  —  n(AA) 

^  1 

f  BB 
[  BO 

q^  -t-  2qr  j 

f  9^ 

[  29r 

n{B) 

f  n{BB) 

\  n{B)  -  n{BB) 

AB 

AB 

2pq 

2pq 

n{AB) 

Totals 

1 

1 

N 

Here  n(0),n(A),n(B)  and  n{AB),  which  will  be  called  the  data,  are  the  observed 
frequencies  of  the  four  blood  groups  in  a  population  of  size  N.  The  frequencies  n(AA)  and 
n{BB)  of  the  genotypes  AA  and  BB  cannot  be  observed.  The  problem  is  to  estimate  the 
probabilities  p,  q  and  r. 

The  data  follow  a  simple  multinomial  distribution  with  4  cells,  where  the  cell  probabil¬ 
ities  are  functions  of  the  parameters  of  interest,  and  the  likelihood  is  proportional  to 

;.2n(0)(p2  _|_  2prY^^\q^  A  2qrY^^\2pqY^^^\ 

The  maximum  likelihood  equations  are  not  easy  to  solve  directly  and  Rao  ((1965)  pp.  305- 
309)  suggests  the  standard  method  of  scoring  to  obtain  the  maximum  likelihood  estimates. 
How  will  a  Bayesian  approach  this  problem?  Since  p-|-9-l-r  =  l,  itis  natural  to  put 
I>(q:i, 02,03),  the  Dirichlet  distribution  with  parameters  oi  >  0, 02  >  0,03  >  0,  as  a  prior 
distribution  for  (p,  9,  r).  The  next  step  is  to  obtain  the  posterior  distribution  conditional 
on  the  data.  Once  again,  this  turns  out  to  be  an  untractable  problem.  However,  if  the 
unobserved  frequencies  n(AA)  and  n{BB)  were  available,  then  the  posterior  of  (p,  q,  r)  given 
the  {data,n(AA),n{BB))  is  easy  to  obtain.  Note  that  the  likelihood  of  the  data,n{AA) 
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and  n{BB)  again  comes  from  a  multinomial  distribution  with  6  cells  and  is  proportional 
to 

^n{AA)+n(A)+n(AB)„n(BB)+n{B)+n(AB)^2n{0)+n(A)-n{AA)+n{B)-n(BB) 

p  q  r 

Denote  {p,q,r)  by  Y  =  Y^^),  and  {n{AA),n{BB))  by  Z  =  (Zi,Z2).  From  the 

above  remarks,  the  conditional  distribution  of  Y  given  (data,  Z)  can  be  written  as 

r{Y|data,Z)}  =P(a;,a'2,a^)  (1.1) 

where 

ctj  =  oil  -j-  ti(AA)  +  n(A)  "I"  n(AB'), 

ct'  =  a2  +  n{BB)  +  n(B)  +  n{AB)  and 

Qg  =  q;3  +  2n(0)  +  n(A)  —  n{AA)  +  n{B)  —  n{BB)). 

It  is  easy  to  write  down  the  conditional  distribution  of  the  unobserved  frequencies  Z 
given  the  data,  Y  as 

C{Z\d<ita,Y)  =  B{n(A),  x  B(n{B),  (1.2) 

where  B(M,  6)  stands  for  the  Binomial  distribution  with  M  trials  and  probability  of  success 

0. 

Notice  that  equations  (1.1)  and  (1.2)  give  us  the  conditional  distributions  C{Y\data,  Z} 
and  £{Z|data,  Y}.  The  Gibbs  sampler,  which  is  a  special  case  of  the  Markov  Chain  Monte 
Carlo  method,  can  be  used  to  obtain  a  Markov  chain  Xq  =  (Yq, Zo),Xi  =  (Yi,Zi),... 
such  that  the  distribution  of  X„  =  (Y„,Z„)  will  converge  to  C{(Y ,Z)\data}  as  n  —*  oo. 
By  considering  just  the  marginals  Yo,  Yi, . . .  we  see  that  Y„  converges  in  distribution  to 
required  posterior  distribution  £{(p,q,r)\data}  —  C{Y\data}.  We  could  also  use  other 
methods  based  on  Markov  chain  theory  to  obtain  better  approximations  to  C{Y\data}. 
Finally,  we  can  approximate  E(Y\data)  which  is  the  Bayes  estimate  of  the  vector  (p,  q,r). 
This  would  be  the  Bayesian  answer  to  the  method  of  scoring  for  maximum  likelihood 
estimates. 

How  is  the  Markov  chain  Xo,Xi,...  generated?  Fix  arbitrary  values  for  (Yo,Zo). 
For  n  =  0,1,...  generate  Yn+i  from  the  distribution  £{Y|data, Z„}  as  given  in  (1.1) 
and  generate  Z„+i  from  the  distribution  £{Z|data,  Yn+i}  as  given  in  (1.2).  This  way 
we  generate  (Yj,  Zi),  (Y2,  Z2), . . ..  It  is  easy  to  see  that  this  is  a  Markov  chain  whose 
transition  function  can  be  expressed  in  terms  of  C{Y\data,  Z}  and  C{Z\data,Y}  and  that 
C{{Y ,Z)\data}  is  an  invariant  distribution  for  this  transition  function. 

2  The  Second  Example 

Let  Y  =  (^1,125  •  •  • ,  Yi)  be  i.i.d  random  variables  with  unknown  distribution  P.  Let 
C'x,C2,...,C„  be  subsets  of  the  real  line,  some  of  which  may  be  singleton  sets.  Suppose 
that  the  data  gives  the  information  Y,  G  Ci,i  =  I ...  ,n.  If  the  C,-  =  {c,},*  —  l,...,n 
are  all  singletons  then  we  have  observed  the  actual  values  of  Yi, . . .  If  the  some  Cj’s 
are  singletons  and  other  (7,’s  are  sets  of  the  form  [c,-,  00),  then  this  corresponds  to  case 
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of  right  censoring.  The  frequentist  solution  to  estimating  P  is  the  usual  Kaplan-Meyer 
estimate.  What  if  one  were  a  Bayesian,  and  one  uses  a  Dirichlet  prior  T>a,  where  a  is  a 
finite  measure  on  the  real  line?  Suppose  that  there  are  m  uncensored  observations  and  n—m 
censored  observations,  that  is  without  loss  of  generality  that  Ci  =  {cj =  {cm} 
are  singletons  and  the  remaining  C.-’s  are  not  singletons.  Then,  the  data  gives  us  the 
information  that  Yi  =  Ci,...,Ki  =  and  that  Ym+i  €  Cm+i,  -  •  •  ,Yn  €  Cn-  Let  V  = 
(y^+i, . . . ,  be  the  actual  unobserved  values  of  the  censored  observations.  What  is  the 
posterior  distribution  of  P  given  data!  From  the  standard  theory  of  Dirichlet  distributions, 
see  for  instance  Ferguson  (1972)  or  Sethuraman  (1994),  the  posterior  distribution  of  P  given 
li  =  Cl, . . . ,  Ym  =  c,n  is  T>a'  where  a'  —  a ^c,-  As  before,  the  posterior  distribution 
of  P  given  data  is  not  tractable.  For  any  probability  measure  fi  and  set  B  with  fi{B)  >  0, 
let  ijlb{C)  =  ^  restriction  to  B.  Then  we  have  the  following  two  facts: 


n 

the  conditional  distribution  of  P  given  {data,  V}  is  Vp  where  ^  =  a'  +  ^  (2.1) 

t=Tn-fl 

and 

n 

the  conditional  distribution  of  V  given  {data,  P}  is  n  Por  (2.2) 

t=m+l 

Starting  from  arbitrary  values  (Po?Vo),  we  can  carry  out  the  Markov  Chain  Monte  Carlo 
Method  to  generate  a  Markov  Chain  (Pi,  Vi),  (P2,  V2), _ We  can  hope  that  its  distribu¬ 

tion  will  converge  to  the  joint  distribution  of  (P,  V)  given  Z,  and  the  required  posterior 
distribution  is  obtained  from  here  by  taking  the  marginal  distribution  of  P.  A  crucial  in¬ 
termediate  step  in  this  Markov  Chain  Monte  Carlo  follows  from  the  constructive  definition 
given  is  Sethuraman  (1994).  See  Doss  (1996)  for  details.  Once  again,  the  question  arises 
whether  this  Markov  Chain  will  converge  to  the  desired  conditional  distribution. 


3  The  Markov  Chain  Monte  Carlo  Method 

Examples  such  as  the  one  described  in  Sections  1  and  2  arise  in  many  areas  of  Statistics. 
In  each  of  these  problems  there  is  a  probability  distribution  tt  on  a  measurable  space 
(AT,  B),  and  we  are  interested  in  estimating  characteristics  of  it  such  as  7r(P)  or  f  fdw 
where  E  E  B  and  /  is  a  bounded  measurable  function.  Even  when  x  is  fully  specified 
one  may  have  to  resort  to  methods  like  Monte  Carlo  simulation,  especially  when  x  is  not 
computationally  tractable.  For  this  one  uses  the  available  huge  literature  on  generation 
of  random  variables  from  an  explicitly  or  implicitly  described  probability  distribution  x. 
Generally  these  methods  require  X  to  be  the  real  line  or  require  that  x  have  special  features, 
such  as  a  structme  in  terms  of  independent  real  valued  random  variables.  When  one  cannot 
generate  random  variables  with  distribution  x  one  has  to  be  satisfied  with  looking  for  a 
sequence  of  random  variables  whose  distributions  converge  to  x  and  using  Xn 

with  a  large  index  n  as  an  observation  from  x.  This  is  called  the  Markov  Chain  Monte 
Carlo  Method.  The  preceding  discussion  of  blood  group  data  from  Dr.  C.  R.  Rao’s  book  is 
an  illustrative  example.  In  this  example,  x  is  the  posterior  distribution  £{Y,  7t\data'\  and 
an  example  of  a  functionals  of  interest  may  be  E{Y\data)  the  Bayes  estimate  of  Y. 
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Let  P  be  a  transition  probability  function  on  a  measurable  space  (A',P),  i.e.  P  is  a 
function  on  X  y.  B  such  that  for  each  a:  G  Af,  P(a:,  •)  is  a  probability  measure  on  (A’,P), 
and  for  each  C  ^  ■^(•5^')  is  a  measurable  function  on  {X^B).  Suppose  that  tt  is  a 

probability  measure  on  {X^B)  which  is  invariant  for  the  Markov  chain,  i.e. 

7r(C)  =  I  Pi.  ,  C)ir{dx)  for  all  C  €  5.  (3.1) 

We  fix  a  starting  point  Xq,  generate  an  observation  Xi  from  P(xo,  •)?  generate  an  observation 

X2  from  P(Xi,*),  etc.  This  generates  the  Markov  chain  xq  —  Xo,Xi,X2, _  In  order  to 

make  use  of  the  Markov  chain  {-^n}^o  6®^  some  information  about  tt,  one  needs  results 

of  the  form: 


(a)  Ergodicity:  For  all  or  for  “most”  starting  values  x,  the  distribution  of  Xn  converges 
to  TT  in  a  suitable  sense,  for  example 

(al)  Variation  norm  ergodicity:  sup^gg  |P”(x,  C)  —  7r(C')|  —>■  0,  or 

(a2)  Variation  norm  mean  ergodicity:  supg^gg  ~  ‘^{P)\  — ♦  0. 

(b)  Law  of  large  numbers:  For  all  or  for  most  starting  values  x,  for  each  C  E  B, 


1  " 

—  IciXi)  — >  T^{C)  for  a.e.  realization  of  the  chain, 

and  for  each  /  with  /  |/|d7r  <  00, 


^  i=i  •' 


for  a.e.  realization  of  the  chain. 


Then,  we  may  estimate  tt  for  example  by  generating  G  such  chains  in  parallel,  obtaining 
independent  observations  . . .  ,X^\  or  by  running  one  (or  a  few)  very  long  chains. 


4  Main  Results 

Our  goal  is  to  find  conditions  on  a  given  Markov  chain  or  rather  on  its  transition  function 
P(-,  •)  so  that  some  or  all  of  the  conditions  (a)  and  (b)  above  hold,  assuming  that  P 
admits  an  invariant  probability  measure  tt.  In  Markov  Chain  Monte  Carlo  applications, 
the  probability  measure  tt  of  interest  is  by  construction  the  invariant  probability  measure 
of  the  Markov  chain. 

When  {X„}  is  a  Markov  chain  with  a  countable  state  space,  say  {1,2,...},  and  transition 
probability  matrix  P  =  the  existence  of  an  invariant  probability  distribution  tt  and 

the  irreducibility  condition  that  there  exists  a  state  io  such  that  from  any  initial  state  i, 
there  is  positive  probability  that  the  chain  eventually  hits  t'o,  are  enough  to  guarantee  that 
(i)  the  chain  {.Xr„}  is  recurrent  in  an  appropriate  sense,  (ii)  conditions  (b)  and  (a2)  above 
hold,  and  (iii)  when  an  additional  aperiodicity  condition  also  holds,  then  (al)  above  also 
holds.  These  facts  are  well  known;  see  for  instance,  Hoel,  Port  and  Stone  (1972). 
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A  natural  question  is  whether  this  is  true  for  general  state  space  Markov  chains.  In 
particular,  when  (3.1)  holds,  is  there  a  form  of  the  irreducibility  condition  under  which 
some  or  all  of  (a)  and  (b)  above  hold? 

The  Markov  chain  literature  has  a  number  of  results  in  this  direction;  see  Orey  (1971), 
Athreya  and  Ney  (1978)  and  Nummehn  (1984).  Under  a  condition  known  as  Harris  recur¬ 
rence  (see  below)  the  existence  of  an  invariant  distribution  tt  implies  mean  ergodicity  (con¬ 
dition  (a2))  and  the  laws  of  large  numbers  (condition  (b)).  Unfortunately,  Harris  recurrence 
is  not  an  easy  condition  to  verify  in  general,  and  it  is  much  stronger  than  irreducibility. 

The  main  point  of  this  paper  is  to  show  that  when  (3.1)  holds,  a  simple  irreducibility 
condition  ((4.3)  below)  is  enough  to  yield  (a2)  and  (b).  An  additional  aperiodicity  con¬ 
dition  yields  (al)  as  well.  This  provides  a  complete  generalization  of  the  results  for  the 
countable  case.  It  is  worth  noting  that  recurrence  emerges  as  a  consequence  of  (3.1)  and 
the  irreducibility  condition  (4-3),  and  is  not  imposed  as  a  hypothesis. 

Before  stating  our  main  theorems,  we  will  need  a  few  definitions.  For  any  set  (7  €  H, 
let  Nn{C)  =  Em=i  I{Xm  €  C)  and  N{C)  =  e  C)  be  the  number  of  visits 

to  C  by  time  n  and  the  total  number  of  visits  to  (7,  respectively.  The  expectations  of 
Nn{C)  and  N{C),  when  the  chain  starts  at  x,  are  given  by  Gn(x,  (7)  =  X2to=i  C)  and 

G(x,  (7)  =  I2m=i  <7))  respectively.  Define  T{C)  =  inf{n  :  n  >  0,  Xn  €  (7}  to  be  the 
first  time  the  chain  hits  (7,  after  time  0.  Note  that  Px{T{C)  <  oo)  >  0  is  equivalent  to 
G(s,  (7)  >  0. 

The  set  A  €  B  is  said  to  be  accessible  from  x  if  Px{T{A)  <  oo)  >  0.  Let  p  be  a 
probability  measure  on  (A’,H).  The  Markov  chain  is  said  to  be  p-recurrent  (or  Harris 
recurrent  with  respect  to  p)  if  for  every  A  with  p{A)  >  0,  Px{T{A)  <  oo)  =  1  for  all  x  €  Af. 
The  chain  is  said  to  be  p-irreducible  if  every  set  A  with  p{A)  >  0  is  accessible  from  all 
X  €.  X.  The  set  A  is  said  to  be  recurrent  if  Px{T{A)  <  oo)  =  1  for  all  x  £  X. 

For  the  case  where  the  cr-field  B  is  separable,  there  is  a  very  useful  equivalent  definition 
of  p-irreducibility  of  a  Markov  chain.  In  this  case,  we  can  deduce  from  Theorem  2.1  of 
Orey  (1971),  on  the  existence  of  “C-sets,”  that  p-irreducibility  of  a  Markov  chain  implies 
that  there  exist  a  set  A  G  H  with  p{A)  >  0,  an  integer  no,  and  a  number  e  >  0  satisfying 

Px{T{A)  <  oo)  >  0  for  all  x  E  X,  (4.1) 

and 

X  G  A,  C  imply  P'^{x,C)  >  ep{C  D  A).  (4.2) 

Let  pa{C)  =  ^ •  This  is  well  defined  because  p{A)  >  0.  The  set  function  pA  is  a 

probability  measure  satisfying  Pa{X)  =  1.  Note  that  (4.1)  simply  states  that  A  is  accessible 
from  all  X  G  A"  and  this  condition  does  not  make  reference  to  the  probability  measure 
p.  Condition  (4.2)  states  that  uniformly  in  x  G  A,  the  no-step  transition  probabilities 
from  X  into  subsets  of  A  are  bounded  below  by  e  times  p.  That  (4.1)  and  (4.2)  imply 
px-irreducibility  is,  of  course,  immediate.  This  alternative  definition  of  p^-irreducibility, 
which  applies  to  nonseparable  cr-fields  as  well,  will  be  usually  much  easier  to  verify  in 
Markov  chain  simulation  problems.  By  replacing  phy  p a-,  we  can  also  assume  with  no  loss 
of  generality  that  /»  is  a  probability  measure  with  p{A)  =  1  when  verifying  Condition  (4.2). 

We  denote  the  greatest  common  divisor  of  any  subset  M.  of  integers  by  g.c.d.(A4). 

We  now  state  two  theorems  which  hold  for  general  Markov  chains.  They  give  sufficient 
conditions  for  the  Markov  Chain  Monte  Carlo  method  to  be  successful  and  constitute  the 
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main  results  of  this  paper.  The  proofs  of  these  theorems  can  be  found  in  Athreya,  Doss 
and  Sethuraman  (1996). 


Theorem  1  Suppose  that  the  Markov  chain  {X„}  with  transition  function  P{x,C)  has  an 
invariant  probability  measure  tt,  i.e.  (3.1)  holds.  Suppose  that  there  is  a  set  A  £  B,  a 
probability  measure  p  with  p{A)  =  \,  a  constant  c  >  0,  and  an  integer  no  ^  1  such  that 


and 

7r|x  :  Px{T{A)  <  oo)  >  o|  =  1, 

(4.3) 

Suppose  further  that 

P"°(x,  •)  >  ep{’)  for  each  x  €  A. 

(4.4) 

g.c.d.^m  :  there  is  an 

Cot  >  0  such  that  P’”(x,  •)  >  e„ip(-)  for  each  x  €  A|  =  1. 

(4.5) 

Then  there  is  a  set  D  such  that 

ir{D)  =  1 

and  sup  |P"(x,  C)  —  'x{C)\  — »  0  for  each  x  E  D. 
ceB 

(4.6) 

Theorem  2  Suppose  that  the  Markov  chain  {X„}  with  transition  function  P{x,  C)  satisfies 
Conditions  (3.1),  (4-3)  and  (4.4)-  Then 

1  I 

sup  —  ^2  P”^^'^'‘{x,C)  —  7r(C')  ^0  as  m  oo  for  [ir]-almost  all  x,  (4.7) 
ceB  no  I 

and  hence 


1  ^ 

sup  “5^  C)  —  7!'(C')  -^0  as  n  oo  for  [x]-a/mosf  all  x. 
CeB  72 

Let  f{x)  be  a  measurable  function  on  {X,B)  such  that  f  T^{dy)\f{y)\  <  oo.  Then 

J ^  for  [7r]-almost  all  X 

and 


—  X^£'x(/(Xj)) f 'ir{dy)f{y)  for[Tr\-almostallx. 
n  J 


(4,8) 


(4.9) 

(4.10) 


Variants  of  these  theorems  form  a  main  core  of  interest  in  the  Markov  chain  litera¬ 
ture.  However,  most  of  this  literature  makes  strong  assumptions  such  as  the  existence  of  a 
recurrent  set  A  and  proves  the  existence  of  an  invariant  probability  measure  before  estab¬ 
lishing  (4.6)  and  (4.7).  Theorems  1  and  2  exploit  the  existence  of  an  invariant  probability 
measure,  which  is  given  to  us  “for  free”  in  the  Markov  chain  simulation  context,  and  estab¬ 
lish  the  ergodicity  or  mean  ergodicity  under  minimal  and  easily  verifiable  assumptions.  For 
example,  we  have  already  noted  that  in  the  context  of  the  Markov  chain  simulation  method, 
we  really  need  to  check  only  (4.3),  (4.4),  and  (4.5).  To  show  (4.3)  in  most  cases  one  will  es¬ 
tablish  that  Px{T{A)  <  oo)  >  0  for  all  x.  Condition  (4.5)  is  usually  called  the  aperiodicity 
condition  and  is  automatically  satisfied  if  (4.4)  holds  with  no  =  1.  Condition  (4.4)  holds  if 
for  each  x  €  A,  P”®(x,  •)  has  a  non-trivial  absolutely  continuous  component  with  respect 
to  some  measure  p  and  the  associated  density  p^  {x,y)  satisfies  infj.,yg^p”®(x,y)  >  0  for 
some  A  with  p{A)  >  0. 
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5  Back  to  the  Illustrative  Examples 

Consider  the  special  case  where  X  =  and  B  =  We  give  below  two 

theorems  that  give  easily  verifiable  conditions  for  the  conclusions  of  Theorems  1  and  2  to 
hold.  We  will  use  one  of  these  theorems  to  show  that  the  conditions  of  Theorems  1  and  2 
hold  in  the  example  of  Section  1  on  data  on  blood  groups  of  humans. 

Consider  the  Markov  Chain  Monte  Carlo  algorithm  for  generating  observations  from 
the  joint  distribution  tt  of  as  described  in  Section  3. 

Theorem  3  Suppose  that  the  conditional  distributions  have 

densities,  say  and  p;f(2)|x(i)(a;^^^lx(^^),  respectively  with  respect  to  some 

dominating  measures  and  Suppose  further  that  for  each  i  =  1,2  there  is  a  set 
with  >  0,  and  a  S  >  0  such  that 

>  0  (5.1) 

whenever 

E  and  x^^^  is  arbitrary, 

and 

P;i;-(i)|;j-(2)(x^^^|x^^^)  >  d  and  ^  whenever  x^^^  € 

(5-2) 

Then  Conditions  (f.S)  and  (4-4)  satisfied  with  no  =  1.  Thus,  (4-5)  is  also  satisfied, 
and  the  conclusions  of  Theorems  1  and  2  hold. 

Theorem  4  Suppose  that  the  conditional  distributions  7r;f(2)|x(i)=x(i)  ®  density,  say 

p^(2)|;f(i)(x^^)lx(^^)  with  respect  to  some  dominating  measure  p^^K  Suppose  that  there  are 
sets  A^^)  and  and  a  6  >  0  such  that 

’rjf(i)|;f(2)(A(^^|x(^^)  >  0  (5.3) 

for  all  x^^), 

>  S  (5,4) 

for  all  x^^)  €  A^^\  and 

p_x’(2)ix(i)(a:^^^|x^^^^)  >  ^  whenever  x^^^  €  A^^\x^^^  €  A^^^  (5.5) 

Then  conditions  (4-3)  and  (4‘4)  satisfied  with  no  =  1.  Thus,  (4-5)  is  also  satisfied,  and 
the  conclusions  of  Theorems  1  and  2  hold. 

We  will  verify  the  conditions  of  Theorem  3  in  the  example  of  blood  group  data  from 
humans  in  Section  1.  Here  =  R^,  X^^^  =  {0, . . . ,  n(A)}  x  {0, . . . , n(H)}, =  Y  and 
=  Z.  We  can  take  p^^^  as  the  Lebesgue  measure  of  R^  and  p^^^  as  the  counting  measure 

on  Ar(2).  Put  A(^)  =  [0.23,0.43]  x  [0.23,0.43]  x  [0.23,  .43]  and  A^^)  =  ^(2)^  It  js  easy  to 

verify  that  pW(y4(‘))  >  Q  for  i  =  1,2  and  that  conditions  (5.1)  and  (5.2)  are  satisfied  for 
some  <5  >  0,  since  the  Dirichlet  distribution  has  a  positive  density  function,  A^^^  is  a  finite 
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set  and  the  Binomial  distribution  with  parameter  in  [0.23,0.43]  has  a  positive  frequency 
function. 

The  verification  of  the  conditions  of  Theorems  1  and  2  for  the  Bayesian  solution  to 
the  Kaplan-Meyer  problem  is  not  quite  straightforward  since  the  state  space  of  (P,  V)  is 
more  complicated.  We  therefore  consider  only  the  Markov  chain  Vq,  Vi,  V2, . . .  whose  state 
space  is  By  using  (2.1)  and  (2.2),  we  see  that  the  probability  that  {Vr  €  x”=m+i-®»} 

given  Vr_i  is  given  by 

Prob{V,  €  = 

> 

> 

> 

"l  ¥  ^  \ 

where  7  =  0:'  +  and  9  =  pp — >  0  is  a  quantity  independent 

of  This  verifies  conditions  (4.3)  and  (4.4)  with  no  =  1  for  the  chain  {V,}. 

Thus,  (4.5)  is  also  satisfied,  and  the  conclusions  of  Theorems  1  and  2  hold.  The  distribution 
of  (Pr,  Vr)  is  a  continuous  function  of  the  distribution  of  V^-i  not  depending  on  r  and  can 
be  written  down  as  follows  by  using  (2.1)  and  (2.2)  once  again: 

Prob{Pr  €  P,  Vr  €  xl^,Bi\Vr-r}  =  /  n  Pc^mV^idP) 

where  7  =  a'  +  I2?=m+i  •  The  convergence  of  the  distributions  of  {Vr}  together 

with  the  above  remark  implies  the  convergence  in  distribution  of  the  whole  Markov  chain 
(Po,  Vo),  (Pi,  Vi),  (P2,  V2), . . .  follows. 


/  n  Pc,iBir\Ci)V^{dP) 
/  n  P{Bir\Ci)v^{dP) 

•'  j=m+l 

nr=m+i  W-^) + *  - 1] 

9  JJ  aci{Bi) 

1=771+1 
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